Back

Acta Crystallographica Section D Structural Biology

International Union of Crystallography (IUCr)

All preprints, ranked by how well they match Acta Crystallographica Section D Structural Biology's content profile, based on 54 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Neutron crystallographic refinement with REFMAC5 of the CCP4 suite

Catapano, L.; Long, F.; Yamashita, K.; Nicholls, R. A.; Steiner, R. A.; Murshudov, G. N.

2023-08-14 bioinformatics 10.1101/2023.08.13.552925 medRxiv
Top 0.1%
52.7%
Show abstract

Hydrogen (H) atoms are abundant in macromolecules and often play critical roles in enzyme catalysis, ligand recognition processes, and protein-protein interactions. However, their direct visualisation by diffraction techniques is challenging. Macromolecular X-ray crystallography affords the localisation of the most ordered H atoms at (sub-)atomic resolution (around 1.2 [A] or higher), that is not often attainable. Differently, neutron diffraction methods enable the visualisation of most H atoms, typically in the form of deuterium (D) atoms at much more common resolution values (better than 2.5 [A]). Thus, neutron crystallography, although technically demanding, is often the method of choice when direct information on protonation states is sought. REFMAC5 of the Collaborative Computational Project No. 4 (CCP4) is a program for the refinement of macromolecular models against X-ray crystallographic and cryo-EM data. This contribution describes its extension to include the refinement of structural models obtained from neutron crystallographic data. Stereochemical restraints with accurate bond distances between H atoms and their parent atom nuclei are now part of the CCP4 Monomer Library, the source of prior chemical information used in refinement. One new feature for neutron data analysis in REFMAC5 is the refinement of the protium/deuterium (1H/D) fraction. This parameter describes the relative 1H/D contribution to neutron scattering for H atoms. The newly developed REFMAC5 algorithms were tested by performing the (re-)refinement of several entries available in the PDB and of one novel structure (FutA) by using either (i) neutron data-only or (ii) neutron data supplemented by external restraints to a reference X-ray crystallographic structure. Re-refinement with REFMAC5 afforded models characterised by R-factor values that are consistent with, and in some cases better than, the originally deposited values. The use of external reference structure restraints during refinement has been observed to be a valuable strategy especially for structures at medium-low resolution. SynopsisThe macromolecular refinement package REFMAC5 of the CCP4 suite has been extended with the incorporation of algorithms for neutron crystallography.

2
Xtricorder: A likelihood-enhanced self-rotation function and application to a machine-learning enhanced Matthews prediction of asymmetric unit copy number

McCoy, A. J.; Read, R. J.

2025-05-26 molecular biology 10.1101/2025.05.22.655506 medRxiv
Top 0.1%
52.4%
Show abstract

Analysis of crystallographic diffraction data before phasing gives the crystallographer a first look at the nature of the problem and the context in which the structure determination will be performed. We here report the development of Xtricorder, an application that targets analysis of crystallographic data specifically for likelihood-based phasing. As well as porting many of the analyses previously available but relatively inaccessible in our Phaser codebase, Xtricorder offers a likelihood-enhanced self-rotation function. A novel and intuitive graphical representation of the self-rotation function presents the results for user inspection, and has the added advantage that, in an adapted form, is appropriate for training a convolutional neural network to enhance the standard Matthews analysis and more accurately predict the number of copies in the asymmetric unit. We investigate the usefulness of the likelihood-enhanced self-rotation function in first look analyses, exploring the circumstances under which the self-rotation function results are useful, and discuss the application to AI-generated structure prediction. Synopsis Xtricorder is a new tool for analysing crystallographic data prior to phasing, featuring a likelihood-enhanced self-rotation function and graphical output that aids both user interpretation and machine learning-based prediction of asymmetric unit content.

3
Redetermination of the first unknown protein MicroED structure by high resolution X-ray diffraction

Xu, H.; Zou, X.; Högbom, M.; Lebrette, H.

2021-04-08 biochemistry 10.1101/2021.04.07.438860 medRxiv
Top 0.1%
44.4%
Show abstract

Microcrystal electron diffraction (MicroED) has the potential to considerably impact the field of structural biology. Indeed, the method can solve atomic structures of a wide range of molecules, beyond the reach of single particle cryo-electron microscopy, exploiting crystals too small for X-ray diffraction (XRD) even using X-ray free-electron lasers. However, until the first unknown protein structure - a R2-like ligand-binding oxidase from Sulfolobus acidocaldarius (SaR2lox) - was recently solved at 3.0 [A] resolution, MicroED had only been used to study known protein structures previously obtained by XRD. Here, after adapting sample preparation protocols, the structure of the SaR2lox protein originally solved by MicroED was redetermined by XRD at 2.1 [A] resolution. In light of the higher resolution XRD data and taking into account experimental differences of the methods, the quality of the MicroED structure is examined. The analysis demonstrates that MicroED provided an overall accurate model, revealing biologically relevant information specific to SaR2lox, such as the absence of an ether cross-link, but did not allow to detect the presence of a ligand visible by XRD in the protein binding pocket. Furthermore, strengths and weaknesses of MicroED compared to XRD are discussed in the perspective of this real-life protein example. The study provides fundaments to help MicroED become a method of choice for solving novel protein structures. SynopsisThe first unknown protein structure solved by microcrystal electron diffraction (MicroED) was recently published. The redetermination by X-ray diffraction of this protein structure provides new insights into the strengths and weaknesses of the promising MicroED method.

4
cctbx.xfel: a suite for processing serial crystallographic data

Brewster, A. S.; Paley, D. W.; Bhowmick, A.; Mittan-Moreau, D. W.; Young, I. D.; Mendez, D.; Tchon, D. M.; Poon, B. K.; Sauter, N. K.

2025-05-04 molecular biology 10.1101/2025.05.04.652045 medRxiv
Top 0.1%
43.6%
Show abstract

The cctbx.xfel suite of processing programs and tools allows fast, visual analysis of serial diffraction images from synchrotrons and XFELs. Built on DIALS and cctbx, cctbx.xfel is designed for real-time and post-experiment processing with a fully featured graphical user interface. Users can quickly identify hitrates, view diffraction patterns, analyze unit-cell isomorphism using clustering, and merge data using a metadata tagging approach that allows on-the-fly organization and visualization of processing results. This paper describes the fundamental algorithms and command-line programs used by cctbx.xfel, including the two main program dials.stills process, which performs spot-finding, indexing, geometric refinement, and integration, and cctbx.xfel.merge, which performs scaling, post-refinement, and merging. A discussion of merging statis-tics is presented and newer features are described, including random sub-sampling for indexing multi-lattice hits and {Delta}CC1/2 filtering to remove outliers. Finally we show a complex, heterogeneous sample containing hexagonal and monoclinic isoforms in P 63 and P 21. The isoforms are separated by unit cell clustering, and for each isoform we resolve a (pseudo-)merohedral indexing ambiguity.

5
Decision-making in serial crystallography: a simple test to quickly determine whether sufficient data have been collected

von Stetten, D.; Pearson, A. R.

2025-08-13 biophysics 10.1101/2025.08.12.669835 medRxiv
Top 0.1%
43.3%
Show abstract

In standard rotational data collection for macromolecular crystallography data are normally collected from a single crystal, and the resulting data processing delivers metrics for data completeness and signal to noise that are well established. However, in serial crystallography it can be difficult to assess quickly whether enough data have been recorded to deliver a well scaled and complete dataset with sufficient signal to noise to address the scientific question being asked. Completeness alone is not an appropriate metric, as a nominally complete dataset can be obtained with a much smaller number of images, and thus multiplicity, than is needed to produce a final dataset with well estimated merged intensity values. Insufficient data result in alarmingly reasonable processing statistics and plausible electron density maps that contain almost no experimental signal, instead being dominated by the phases from the phasing model. We have therefore established a simple electron density-based test to determine whether enough data have been collected, and implemented this in the autoprocessing pipeline at the T-REXX endstation on beamline P14 at PETRA III. Importantly, the results of this test help guide decisions as to whether more data should be collected, or whether the experimenter can move onto a new time-point or sample. SynopsisWe describe a simple test to determine whether sufficient data have been collected during a serial crystallographic experiment, and its incorporation into the autoprocessing pipeline at the T- REXX endstation on beamline P14 at the PETRA III synchrotron.

6
xia2.multiplex: a multi-crystal data analysis pipeline

Gildea, R. J.; Beilsten-Edmands, J.; Axford, D.; Horrell, S.; Aller, P.; Sandy, J.; Sanchez-Weatherby, J.; Owen, C. D.; Lukacik, P.; Strain-Damerell, C. J.; Owen, R. L.; Walsh, M. A.; Winter, G.

2022-01-18 molecular biology 10.1101/2022.01.17.476589 medRxiv
Top 0.1%
41.2%
Show abstract

In macromolecular crystallography radiation damage limits the amount of data that can be collected from a single crystal. It is often necessary to merge data sets from multiple crystals, for example small-wedge data collections on micro-crystals, in situ room-temperature data collections, and collection from membrane proteins in lipidic mesophase. Whilst indexing and integration of individual data sets may be relatively straightforward with existing software, merging multiple data sets from small wedges presents new challenges. Identification of a consensus symmetry can be problematic, particularly in the presence of a potential indexing ambiguity. Furthermore, the presence of non-isomorphous or poor-quality data sets may reduce the overall quality of the final merged data set. To facilitate and help optimise the scaling and merging of multiple data sets, we developed a new program, xia2.multiplex, which takes data sets individually integrated with DIALS and performs symmetry analysis, scaling and merging of multicrystal data sets. xia2.multiplex also performs analysis of various pathologies that typically affect multi-crystal data sets, including non-isomorphism, radiation damage and preferential orientation. After describing a number of use cases, we demonstrate the benefit of xia2.multiplex within a wider autoprocessing framework in facilitating a multi-crystal experiment collected as part of in situ room-temperature fragment screening experiments on the SARS-CoV-2 main protease.

7
Serial Crystallography with Multi-stage Merging of 1000s of Images

Soares, A.; Yamada, Y.; Jakoncic, J.; McSweeney, S.; Sweet, R. M.; Skinner, J.; Foadi, J.; Fuchs, M. R.; Schneider, D. K.; Shi, W.; Andrews, L. C.; Bernstein, H. J.

2022-06-19 bioinformatics 10.1101/141770 medRxiv
Top 0.1%
38.8%
Show abstract

KAMO and Blend provide particularly effective tools to manage automatically the merging of large numbers of datasets from serial crystallography. The requirement for manual intervention in the process can be reduced by extending Blend to support additional clustering options such as use of more accurate cell distance metrics and use of reflection-intensity correlation coefficients to infer "distances" among sets of reflec- tions. This increases the sensitivity to differences in unit cell parameters and allows for clustering to assemble nearly complete datasets on the basis of intensity or ampli- tude differences. If datasets are already sufficiently complete to permit it, one applies KAMO once and clusters the data using intensities only. If starting from incomplete datasets, one applies KAMO twice, first using cell parameters. In this step we use either the simple cell vector distance of the original Blend, or we use the more sensi- tive NCDist. This step tends to find clusters of sufficient size so that, when merged, each cluster is sufficiently complete to allow reflection intensities or amplitudes to be compared. One then uses KAMO again using the correlation between the reflections having a common hkl to merge clusters in a way sensitive to structural differences that may not have perturbed the cell parameters sufficiently to make meaningful clusters. Many groups have developed effective clustering algorithms that use a measurable physical parameter from each diffraction still or wedge to cluster the data into cate- gories which then can be merged, one hopes, to yield the electron density from a single protein form. Since these physical parameters are often largely independent from one another, it should be possible to greatly improve the efficacy of data clustering software by using a multi-stage partitioning strategy. Here, we have demonstrated one possible approach to multi-stage data clustering. Our strategy is to use unit-cell clustering until merged data is sufficiently complete then to use intensity-based clustering. We have demonstrated that, using this strategy, we are able to accurately cluster datasets from crystals that have subtle differences.

8
Gentle and fast all-atom model refinement to cryo-EM densities via Bayes' approach

Blau, C.; Yvonnesdotter, L.; Lindahl, E.

2022-09-30 biophysics 10.1101/2022.09.30.510249 medRxiv
Top 0.1%
37.2%
Show abstract

Better detectors and automated data collection have generated a flood of high-resolution cryo-EM maps, which in turn has renewed interest in improving methods for determining structure models corresponding to these maps. However, automatically fitting atoms to densities becomes difficult as their resolution increases and the refinement potential has a vast number of local minima. In practice, the problem becomes even more complex when one also wants to achieve a balance between a good fit of atom positions to the map, while also establishing good stereochemistry or allowing protein secondary structure to change during fitting. Here, we present a solution to this challenge using Bayes approach by formulating the problem as identifying the structure most likely to have produced the observed density map. This allows us to derive a new type of smooth refinement potential - based on relative entropy - in combination with a novel adaptive force scaling algorithm to allow balancing of force-field and density-based potentials. In a low-noise scenario, as expected from modern cryo-EM data, the Bayesian refinement potential outperforms alternatives, and the adaptive force scaling appears to also aid existing refinement potentials. The method is available as a component in the GROMACS molecular simulation toolkit.

9
Rendering protein structures inside cells at the atomic level with Unreal Engine

Chen, M.

2023-12-11 scientific communication and education 10.1101/2023.12.08.570879 medRxiv
Top 0.1%
33.3%
Show abstract

While the recent development of cryogenic electron tomography (CryoET) makes it possible to identify various macromolecules inside cells and determine their structure at near-atomic resolution, it remains challenging to visualize the complex cellular environment at the atomic level. One of the main hurdles in cell visualization is to render the millions of molecules in real time computationally. Here, using a video game engine, we demonstrate the capability of rendering massive biological macromolecules at the atomic level within their native environment. To facilitate the visualization, we also provide tools that help the interactive navigation inside the cells, as well as software that converts protein structures identified using CryoET to a scene that can be explored with the game engine. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=100 SRC="FIGDIR/small/570879v2_ufig1.gif" ALT="Figure 1"> View larger version (87K): org.highwire.dtl.DTLVardef@1e5a2feorg.highwire.dtl.DTLVardef@12fe835org.highwire.dtl.DTLVardef@3629a1org.highwire.dtl.DTLVardef@1e70922_HPS_FORMAT_FIGEXP M_FIG C_FIG

10
The Low-Cost, Semi-Automated Shifter Microscope Stage Transforms Speed and Robustness of Manual Protein Crystal Harvesting

Wright, N.; von Delft, F.; Collins, P.; Talon, R.; Nelson, E.; Koekemoer, L.; Ye, M.; Nowak, R.; Newman, J.; Ng, J. T.; Mitrovich, N.; Wiggers, H.

2019-12-20 biophysics 10.1101/2019.12.20.875674 medRxiv
Top 0.1%
28.8%
Show abstract

Despite the tremendous success of x-ray cryocrystallography over recent decades, the transfer of crystals from the drops where they grow to diffractometer sample mounts, remains a manual process in almost all laboratories. Here we describe the Shifter, a semi-automated microscope stage that offers an accessible and scalable approach to crystal mounting that exploits on the strengths of both humans and machines. The Shifter control software manoeuvres sample drops beneath a hole in a clear protective cover, for human mounting under a microscope. By allowing complete removal of film seals the tedium of cutting or removing the seal is eliminated. The control software also automatically captures experimental annotations for uploading to the users data repository, removing the overhead of manual documentation. The Shifter facilitates mounting rates of 100-240 crystals per hour, in a more controlled process than manual mounting, which greatly extends the lifetime of drops and thus allows for a dramatic increase in the number of crystals retrievable from any given drop, without loss of X-ray diffraction quality. In 2015 the first in a series of three Shifter devices was deployed as part of the XChem fragment screening facility at Diamond Light Source (DLS), where they have since facilitated the mounting of over 100,000 crystals. The Shifter was engineered to be simple, allowing for a low-cost device to be commercialised and thus potentially transformative as many research initiatives as possible. SynopsisA motorised X/Y microscope stage is presented that combines human fine motor control with machine automation and automated experiment documentation, to transform productivity in protein crystal harvesting.

11
Scaling and merging macromolecular diffuse scattering with mdx2

Meisburger, S. P.; Ando, N.

Top 0.1%
27.4%
Show abstract

Diffuse scattering is a promising method to gain additional insight into protein dynamics from macro-molecular crystallography (MX) experiments. Bragg intensities yield the average electron density, while the diffuse scattering can be processed to obtain a three-dimensional reciprocal space map, that is further analyzed to determine correlated motion. To make diffuse scattering techniques more accessible, we have created software for data processing called mdx2 that is both convenient to use and simple to extend and modify. Mdx2 is written in Python, and it interfaces with DIALS to implement self-contained data reduction workflows. Data are stored in NeXusformat for software interchange and convenient visualization. Mdx2 can be run on the command line or imported as a package, for instance to encapsulate a complete workflow in a Jupyter notebook for reproducible computing and education. Here, we describe mdx2 version 1.0, a new release incorporating state-of-the-art techniques for data reduction. We describe the implementation of a complete multi-crystal scaling and merging workflow, and test the methods using a high-redundancy dataset from cubic insulin. We show that redundancy can be leveraged during scaling to correct systematic errors, and obtain accurate and reproducible measurements of weak diffuse signals. SynopsisMdx2 is a Python toolkit for processing diffuse scattering data from macromolecular crystals. We describe multi-crystal scaling and merging procedures implemented in the latest version of mdx2. A high-redundancy dataset from cubic insulin is processed to reveal weak scattering features.

12
A mosaic bulk-solvent model improves density maps and the fit between model and data

Afonine, P. V.; Adams, P. D.; Sobolev, O. V.; Urzhumtsev, A.

2021-12-09 bioinformatics 10.1101/2021.12.09.471976 medRxiv
Top 0.1%
26.5%
Show abstract

Bulk solvent is a major component of bio-macromolecular crystals and therefore contributes significantly to diffraction intensities. Accurate modeling of the bulk-solvent region has been recognized as important for many crystallographic calculations, from computing of R-factors and density maps to model building and refinement. Owing to its simplicity and computational and modeling power, the flat (mask-based) bulk-solvent model introduced by Jiang & Brunger (1994) is used by most modern crystallographic software packages to account for disordered solvent. In this manuscript we describe further developments of the mask-based model that improves the fit between the model and the data and aids in map interpretation. The new algorithm, here referred to as mosaic bulk-solvent model, considers solvent variation across the unit cell. The mosaic model is implemented in the computational crystallography toolbox and can be used in Phenix in most contexts where accounting for bulk-solvent is required. It has been optimized and validated using a sufficiently large subset of the Protein Data Bank entries that have crystallographic data available. SynopsisA mosaic bulk-solvent method models disordered solvent more accurately than current flat bulk solvent model. This improves the fit between the model and the data, improves map quality and allows for the solution of problems previously inaccessible.

13
In the AlphaFold era, when is experimental phasing of protein crystals still required?

Keegan, R.; Simpkin, A. J.; Rigden, D. J.

2024-07-22 bioinformatics 10.1101/2024.07.19.604295 medRxiv
Top 0.1%
26.3%
Show abstract

The availability of highly accurate protein structure predictions from AlphaFold 2 (AF2) and similar tools has hugely expanded the applicability of Molecular Replacement (MR) for crystal structure solution. Many structures solve routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single-wavelength anomalous diffraction (SAD). Here we address the question using a large set of PDB deposits that were solved by SAD. A large majority (87%) solve using unedited or minimally edited AF2 predictions. A further 17 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using SliceNDice, although different splitting methods succeed on slightly different sets of cases. We also find that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold-Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD-phased set did not yield to any form of MR tested here, offering valuable hints as to the number and characteristics of cases where experimental phasing remains essential for macromolecular structure solution.

14
Variable Resolution Maps (VRM) in CCTBX and Phenix: Accounting For Local Resolution In cryoEM

Afonine, P.; Adams, P. D.; Urzhumtsev, A. G.

2026-03-28 bioinformatics 10.64898/2026.03.25.714315 medRxiv
Top 0.1%
26.1%
Show abstract

Calculation of density maps from atomic models is essential for structural studies using crystallography and electron cryo-microscopy (cryoEM). These maps serve various purposes, including atomic model building, refinement, visualization, and validation. However, accurately comparing model-calculated maps to experimental data poses challenges, particularly because the resolution of cryoEM experimental maps varies across the map. Traditional crystallography methods generate finite-resolution maps with uniform resolution throughout the unit cell volume, while most modern software in cryoEM employ Gaussian-like functions to generate these maps, which does not adequately account for atomic model parameters and resolution. Recent work by Urzhumtsev & Lunin (2022, IUCr Journal, 9, 728-734) introduces a novel method for computing atomic model maps that incorporate local resolution and can be expressed as analytically differentiable functions of all atomic parameters. This approach enhances the accuracy of matching atomic models to experimental maps. In this paper, we detail the implementation of this method in CCTBX and Phenix. SynopsisNew tools implemented in CCTBX and Phenix allow the calculation of variable-resolution maps through a sum of atomic images expressed as analytic functions of all atomic parameters, along with their associated local resolution.

15
DoRIAT: A Bayesian Framework For Interpreting And Annotating Docking Runs.

Maniatis, C.; Ouaray, Z.; Xiao, K.; Dixon, T. P. E.; Snowden, J.; Teng, M. S.; Hurst, J.

2024-12-05 systems biology 10.1101/2024.12.02.626325 medRxiv
Top 0.1%
25.5%
Show abstract

The advent of sequence-to-structure deeplearning models have transformed protein engineering landscape by providing an accurate and cost effective way to determine crystal structures. Despite their accuracy, deep-learning predictions tend to give limited insights around protein dynamics. To improve conformation exploration we have developed a machine learning pipeline that combines deep-learning predictions with molecular docking. In this report, we propose Docking Run Intepretation and Annotation Tool (DoRIAT). In contrast to frameworks that score models based on interface interactions, DoRIAT uses a set of parameters that summarize binding conformation. We use DoRIAT to score output from docking runs, identify complexes close to the native structure and create ensembles of models with similar binding conformations. Our results demonstrate that the single structural model DoRIAT selects to be the closest representation of the crystal structure lies within the top 10 of docked models, ranked by Root Mean Squared Distance(RMSD), in around 80% of cases.

16
Damaged goods? Evaluating the impact of X-ray damage on conformational heterogeneity in room temperature and cryo-cooled protein crystals

Yabukarski, F.; Doukov, T.; Mokhtari, D. A.; Du, S.; Herschlag, D.

2021-06-28 biophysics 10.1101/2021.06.27.450091 medRxiv
Top 0.1%
24.0%
Show abstract

X-ray crystallography is a cornerstone of biochemistry. Traditional freezing of protein crystals to cryo-temperatures mitigates X-ray damage and facilitates crystal handling but provides an incomplete window into the ensemble of conformations at the heart of protein function and energetics. Room temperature (RT) X-ray crystallography provides more extensive ensemble information, and recent developments allow conformational heterogeneity, the experimental manifestation of ensembles, to be extracted from single crystal data. However, high sensitivity to X-ray damage at RT raises concerns about data reliability. To systematically address this critical question, we obtained increasingly X-ray-damaged high-resolution datasets (1.02-1.52 [A]) from single thaumatin, proteinase K, and lysozyme crystals. Heterogeneity analyses indicated a modest increase in conformational disorder with X-ray damage. Nevertheless, these effects do not alter overall conclusions and can be minimized by limiting the extent of X-ray damage or eliminated by extrapolation to obtain heterogeneity information free from X-ray damage effects. To compare these effects to damage at cryo temperature and to learn more about damage and heterogeneity in cryo-cooled crystals, we carried out an analogous analysis of increasingly damaged proteinase K cryo datasets (0.9-1.16 [A]). We found X-ray damage-associated heterogeneity changes that were not observed at RT. This observation and the scarcity of reported X-ray doses and damage extent render it difficult to distinguish real from artifactual conformations, including those occurring as a function of temperature. The ability to aquire reliable heterogeneity information from single crystals at RT provides strong motivation for further development and routine implementation of RT X-ray crystallography to obtain conformational ensemble information. SignificanceX-ray crystallography has allowed biologists to visualize the proteins that carry out complex biological processes and has provided powerful insights into how these molecules function. Our next level of understanding requires information about the ensemble of conformations that is at the heart of protein function and energetics. Prior results have shown that room temperature (RT) X-ray crystallography provides extensive ensemble information, but are subject to extenstive X-ray damage. We found that ensemble information with little or no effects from X-ray damage can be collected at RT. We also found that damage effects may be more prevalent than recognized in structures obtained under current standard cryogenic conditions. RT X-ray crystallography can be routinely implemented to obtain needed information about conformational ensembles.

17
Modeling Bias Toward Binding Sites in PDB Structural Models

Wankowicz, S. A.

2024-12-15 biophysics 10.1101/2024.12.14.628518 medRxiv
Top 0.1%
23.3%
Show abstract

The protein data bank (PDB) is one of the richest databases in biology. The structural models deposited have provided insights into protein folds, relationships to evolution, energy functions of structures, and most recently, protein structure prediction, connecting sequence to structure. However, the X-ray crystallography (and cryo-EM) models deposited in the PDB are determined by a combination of refinement algorithms and manual modeling. The intervention of human modeling leads to the possibility that within a single structure, there can be differences in how well parts of a structure are modeled and/or fit the underlying experimental data. We identified that small molecule binding sites are more carefully modeled and better match the underlying experimental data than the rest of the protein structural model. This trend persisted irrespective of the structures resolution or its overall agreement with the experimental data. The variation of modeling has implications for how we interpret protein structural models and use structural models in explaining mechanisms, structural bioinformatics, simulations, docking, and structure prediction, especially when drawing conclusions about binding sites compared to the rest of the protein.

18
Crystallography of lamin A facilitated by chimeric fusions

Stalmans, G.; Lilina, A. V.; Strelkov, S. V.

2020-02-28 molecular biology 10.1101/2020.02.28.969220 medRxiv
Top 0.1%
23.0%
Show abstract

All proteins of the intermediate filament (IF) family contain the signature central -helical domain which forms a coiled-coil dimer. Because of its length, past structural studies relied on a divide-and-conquer strategy whereby fragments of this domain were recombinantly produced, crystallized and analysed using X-rays. Here we describe a further development of this approach towards structural studies of nuclear IF protein lamin. To this end, we have fused lamin A fragments to short N- and C-terminal capping motifs which provide for the correct formation of parallel, in-register coiled-coil dimers. As the result, a chimeric construct containing lamin A residues 17-70 C-terminally capped by the Eb1 domain was solved to 1.83 [A] resolution. Another chimera containing lamin A residues 327-403 N-terminally capped by the Gp7 domain was solved to 2.9 [A]. In the latter case the capping motif was additionally modified to include a disulphide bridge at the dimer interface. We discuss multiple benefits of fusing coiled-coil dimers with such capping motifs, including a convenient crystallographic phasing by either molecular replacement or sulphur single-wavelength anomalous dispersion (S-SAD) measurements.

19
Scaling and Merging Time-Resolved Laue Data with Variational Inference

Zielinski, K. A.; Dolamore, C.; Wang, H. K.; Henning, R. W.; Wilson, M. A.; Pollack, L.; Srajer, V.; Hekstra, D. R.; Dalton, K. M.

2024-07-31 biophysics 10.1101/2024.07.30.605871 medRxiv
Top 0.1%
22.8%
Show abstract

Time-resolved X-ray crystallography (TR-X) at synchrotrons and free electron lasers is a promising technique for recording dynamics of molecules at atomic resolution. While experimental methods for TR-X have proliferated and matured, data analysis is often difficult. Extracting small, time-dependent changes in signal is frequently a bottleneck for practitioners. Recent work demonstrated this challenge can be addressed when merging redundant observations by a statistical technique known as variational inference (VI). However, the variational approach to time-resolved data analysis requires identification of successful hyperparameters in order to optimally extract signal. In this case study, we present a successful application of VI to time-resolved changes in an enzyme, DJ-1, upon mixing with a substrate molecule, methylglyoxal. We present a strategy to extract high signal-to-noise changes in electron density from these data. Furthermore, we conduct an ablation study, in which we systematically remove one hyperparameter at a time to demonstrate the impact of each hyperparameter choice on the success of our model. We expect this case study will serve as a practical example for how others may deploy VI in order to analyze their time-resolved diffraction data.

20
One particle per residue is sufficient to describe all-atom protein structures

Heo, L.; Feig, M.

2023-05-23 biophysics 10.1101/2023.05.22.541652 medRxiv
Top 0.1%
22.7%
Show abstract

Atomistic resolution is considered the standard for high-resolution biomolecular structures, but coarse-grained models are often necessary to reflect limited experimental resolution or to achieve feasibility in computational studies. It is generally assumed that reduced representations involve a loss of detail, accuracy, and transferability. This study explores the use of advanced machine-learning networks to learn from known structures of proteins how to reconstruct atomistic models from reduced representations to assess how much information is lost when the vast knowledge about protein structures is taken into account. The main finding is that highly accurate and stereochemically realistic all-atom structures can be recovered with minimal loss of information from just a single bead per amino acid residue, especially when placed at the side chain center of mass. High-accuracy reconstructions with better than 1 [A] heavy atom root-mean square deviations are still possible when only C coordinates are used as input. This suggests that lower-resolution representations are essentially sufficient to represent protein structures when combined with a machine-learning framework that encodes knowledge from known structures. Practical applications of this high-accuracy reconstruction scheme are illustrated for adding atomistic detail to low-resolution structures from experiment or coarse-grained models generated from computational modeling. Moreover, a rapid, deterministic all-atom reconstruction scheme allows the implementation of an efficient multi-scale framework. As a demonstration, the rapid refinement of accurate models against cryoEM densities is shown where sampling at the coarse-grained level is guided by map correlation functions applied at the atomistic level. With this approach, the accuracy of standard all-atom simulation based refinement schemes can be matched at a fraction of the computational cost. STATEMENT OF SIGNIFICANCEThe fundamental insight of this work is that atomistic detail of proteins can be recovered with minimal loss of information from highly reduced representations with just a single bead per amino acid residue. This is possible by encoding the existing knowledge about protein structures in a machine-learning model. This suggests that it is not strictly necessary to resolve structures in atomistic detail in experiments, computational modeling, or the generation of protein conformations via neural networks since atomistic details can inferred quickly via the neural network. This increases the relevance of experimental structures obtained at lower resolutions and broadens the impact of coarse-grained modeling.